976 resultados para statistical significance


Relevância:

100.00% 100.00%

Publicador:

Resumo:

Sequential firings with fixed time delays are frequently observed in simultaneous recordings from multiple neurons. Such temporal patterns are potentially indicative of underlying microcircuits and it is important to know when a repeatedly occurring pattern is statistically significant. These sequences are typically identified through correlation counts. In this paper we present a method for assessing the significance of such correlations. We specify the null hypothesis in terms of a bound on the conditional probabilities that characterize the influence of one neuron on another. This method of testing significance is more general than the currently available methods since under our null hypothesis we do not assume that the spiking processes of different neurons are independent. The structure of our null hypothesis also allows us to rank order the detected patterns. We demonstrate our method on simulated spike trains.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Frequent episode discovery is one of the methods used for temporal pattern discovery in sequential data. An episode is a partially ordered set of nodes with each node associated with an event type. For more than a decade, algorithms existed for episode discovery only when the associated partial order is total (serial episode) or trivial (parallel episode). Recently, the literature has seen algorithms for discovering episodes with general partial orders. In frequent pattern mining, the threshold beyond which a pattern is inferred to be interesting is typically user-defined and arbitrary. One way of addressing this issue in the pattern mining literature has been based on the framework of statistical hypothesis testing. This paper presents a method of assessing statistical significance of episode patterns with general partial orders. A method is proposed to calculate thresholds, on the non-overlapped frequency, beyond which an episode pattern would be inferred to be statistically significant. The method is first explained for the case of injective episodes with general partial orders. An injective episode is one where event-types are not allowed to repeat. Later it is pointed out how the method can be extended to the class of all episodes. The significance threshold calculations for general partial order episodes proposed here also generalize the existing significance results for serial episodes. Through simulations studies, the usefulness of these statistical thresholds in pruning uninteresting patterns is illustrated. (C) 2014 Elsevier Inc. All rights reserved.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Objectives

A P-value <0.05 is one metric used to evaluate the results of a randomized controlled trial (RCT). We wondered how often statistically significant results in RCTs may be lost with small changes in the numbers of outcomes.

Study Design and Setting

A review of RCTs in high-impact medical journals that reported a statistically significant result for at least one dichotomous or time-to-event outcome in the abstract. In the group with the smallest number of events, we changed the status of patients without an event to an event until the P-value exceeded 0.05. We labeled this number the Fragility Index; smaller numbers indicated a more fragile result.

Results

The 399 eligible trials had a median sample size of 682 patients (range: 15-112,604) and a median of 112 events (range: 8-5,142); 53% reported a P-value <0.01. The median Fragility Index was 8 (range: 0-109); 25% had a Fragility Index of 3 or less. In 53% of trials, the Fragility Index was less than the number of patients lost to follow-up.

Conclusion

The statistically significant results of many RCTs hinge on small numbers of events. The Fragility Index complements the P-value and helps identify less robust results. 

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Heinz recently completed a comprehensive experiment in self-play using the FRITZ chess engine to establish the ‘decreasing returns’ hypothesis with specific levels of statistical confidence. This note revisits the results and recalculates the confidence levels of this and other hypotheses. These appear to be better than Heinz’ initial analysis suggests.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Heinz recently completed a comprehensive experiment in self-play using the FRITZ chess engine to establish the ‘decreasing returns’ hypothesis with specific levels of statistical confidence. This note revisits the results and recalculates the confidence levels of this and other hypotheses. These appear to be better than Heinz’ initial analysis suggests.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The question as to whether it is better to diversify a real estate portfolio within a property type across the regions or within a region across the property types is one of continuing interest for academics and practitioners alike. The current study, however, is somewhat different from the usual sector/regional analysis taking account of the fact that holdings in the UK real estate market are heavily concentrated in a single region, London. As a result this study is designed to investigate whether a real estate fund manager can obtain a statistically significant improvement in risk/return performance from extending out of a London based portfolio into firstly the rest of the South East of England and then into the remainder of the UK, or whether the manger would be better off staying within London and diversifying across the various property types. The results indicating that staying within London and diversifying across the various property types may offer performance comparable with regional diversification, although this conclusion largely depends on the time period and the fund manager’s ability to diversify efficiently.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

In this study, we estimate the statistical significance of structure prediction by threading. We introduce a single parameter ɛ that serves as a universal measure determining the probability that the best alignment is indeed a native-like analog. Parameter ɛ takes into account both length and composition of the query sequence and the number of decoys in threading simulation. It can be computed directly from the query sequence and potential of interactions, eliminating the need for sequence reshuffling and realignment. Although our theoretical analysis is general, here we compare its predictions with the results of gapless threading. Finally we estimate the number of decoys from which the native structure can be found by existing potentials of interactions. We discuss how this analysis can be extended to determine the optimal gap penalties for any sequence-structure alignment (threading) method, thus optimizing it to maximum possible performance.

Relevância:

70.00% 70.00%

Publicador:

Resumo:

Background The problem of silent multiple comparisons is one of the most difficult statistical problems faced by scientists. It is a particular problem for investigating a one-off cancer cluster reported to a health department because any one of hundreds, or possibly thousands, of neighbourhoods, schools, or workplaces could have reported a cluster, which could have been for any one of several types of cancer or any one of several time periods. Methods This paper contrasts the frequentist approach with a Bayesian approach for dealing with silent multiple comparisons in the context of a one-off cluster reported to a health department. Two published cluster investigations were re-analysed using the Dunn-Sidak method to adjust frequentist p-values and confidence intervals for silent multiple comparisons. Bayesian methods were based on the Gamma distribution. Results Bayesian analysis with non-informative priors produced results similar to the frequentist analysis, and suggested that both clusters represented a statistical excess. In the frequentist framework, the statistical significance of both clusters was extremely sensitive to the number of silent multiple comparisons, which can only ever be a subjective "guesstimate". The Bayesian approach is also subjective: whether there is an apparent statistical excess depends on the specified prior. Conclusion In cluster investigations, the frequentist approach is just as subjective as the Bayesian approach, but the Bayesian approach is less ambitious in that it treats the analysis as a synthesis of data and personal judgements (possibly poor ones), rather than objective reality. Bayesian analysis is (arguably) a useful tool to support complicated decision-making, because it makes the uncertainty associated with silent multiple comparisons explicit.

Relevância:

70.00% 70.00%

Publicador:

Resumo:

We consider the problem of detecting statistically significant sequential patterns in multineuronal spike trains. These patterns are characterized by ordered sequences of spikes from different neurons with specific delays between spikes. We have previously proposed a data-mining scheme to efficiently discover such patterns, which occur often enough in the data. Here we propose a method to determine the statistical significance of such repeating patterns. The novelty of our approach is that we use a compound null hypothesis that not only includes models of independent neurons but also models where neurons have weak dependencies. The strength of interaction among the neurons is represented in terms of certain pair-wise conditional probabilities. We specify our null hypothesis by putting an upper bound on all such conditional probabilities. We construct a probabilistic model that captures the counting process and use this to derive a test of significance for rejecting such a compound null hypothesis. The structure of our null hypothesis also allows us to rank-order different significant patterns. We illustrate the effectiveness of our approach using spike trains generated with a simulator.

Relevância:

70.00% 70.00%

Publicador:

Resumo:

Interest in development of offshore renewable energy facilities has led to a need for high-quality, statistically robust information on marine wildlife distributions. A practical approach is described to estimate the amount of sampling effort required to have sufficient statistical power to identify species specific “hotspots” and “coldspots” of marine bird abundance and occurrence in an offshore environment divided into discrete spatial units (e.g., lease blocks), where “hotspots” and “coldspots” are defined relative to a reference (e.g., regional) mean abundance and/or occurrence probability for each species of interest. For example, a location with average abundance or occurrence that is three times larger the mean (3x effect size) could be defined as a “hotspot,” and a location that is three times smaller than the mean (1/3x effect size) as a “coldspot.” The choice of the effect size used to define hot and coldspots will generally depend on a combination of ecological and regulatory considerations. A method is also developed for testing the statistical significance of possible hotspots and coldspots. Both methods are illustrated with historical seabird survey data from the USGS Avian Compendium Database.

Relevância:

70.00% 70.00%

Publicador:

Resumo:

In this article, we focus on the analysis of competitive gene set methods for detecting the statistical significance of pathways from gene expression data. Our main result is to demonstrate that some of the most frequently used gene set methods, GSEA, GSEArot and GAGE, are severely influenced by the filtering of the data in a way that such an analysis is no longer reconcilable with the principles of statistical inference, rendering the obtained results in the worst case inexpressive. A possible consequence of this is that these methods can increase their power by the addition of unrelated data and noise. Our results are obtained within a bootstrapping framework that allows a rigorous assessment of the robustness of results and enables power estimates. Our results indicate that when using competitive gene set methods, it is imperative to apply a stringent gene filtering criterion. However, even when genes are filtered appropriately, for gene expression data from chips that do not provide a genome-scale coverage of the expression values of all mRNAs, this is not enough for GSEA, GSEArot and GAGE to ensure the statistical soundness of the applied procedure. For this reason, for biomedical and clinical studies, we strongly advice not to use GSEA, GSEArot and GAGE for such data sets.